AITopics | simple action

We study an online decision making problem where on each round a learner chooses a list of items based on some side information, receives a scalar feedback value for each individual item, and a reward that is linearly related to this feedback. These problems, known as contextual semibandits, arise in crowdsourcing, recommendation, and many other domains. This paper reduces contextual semibandits to supervised learning, allowing us to leverage powerful supervised learning methods in this partial-feedback setting. Our first reduction applies when the mapping from feedback to reward is known and leads to a computationally efficient algorithm with near-optimal regret. We show that this algorithm outperforms state-of-the-art approaches on real-world learning-to-rank datasets, demonstrating the advantage of oracle-based algorithms. Our second reduction applies to the previously unstudied setting when the linear mapping from feedback to reward is unknown. Our regret guarantees are superior to prior techniques that ignore the feedback.

artificial intelligence, inductive learning, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)

Add feedback

Contextual semibandits via supervised learning oracles † ‡ Miroslav Dudík ‡

Neural Information Processing SystemsMar-12-2024, 18:44:28 GMT

We study an online decision making problem where on each round a learner chooses a list of items based on some side information, receives a scalar feedback value for each individual item, and a reward that is linearly related to this feedback. These problems, known as contextual semibandits, arise in crowdsourcing, recommendation, and many other domains. This paper reduces contextual semibandits to supervised learning, allowing us to leverage powerful supervised learning methods in this partial-feedback setting. Our first reduction applies when the mapping from feedback to reward is known and leads to a computationally efficient algorithm with near-optimal regret. We show that this algorithm outperforms state-of-the-art approaches on real-world learning-to-rank datasets, demonstrating the advantage of oracle-based algorithms. Our second reduction applies to the previously unstudied setting when the linear mapping from feedback to reward is unknown. Our regret guarantees are superior to prior techniques that ignore the feedback.

algorithm, composite action, simple action, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (0.96)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)

Add feedback

Contextual semibandits via supervised learning oracles

Krishnamurthy, Akshay, Agarwal, Alekh, Dudik, Miro

Neural Information Processing SystemsDec-31-2016

We study an online decision making problem where on each round a learner chooses a list of items based on some side information, receives a scalar feedback value for each individual item, and a reward that is linearly related to this feedback. These problems, known as contextual semibandits, arise in crowdsourcing, recommendation, and many other domains. This paper reduces contextual semibandits to supervised learning, allowing us to leverage powerful supervised learning methods in this partial-feedback setting. Our first reduction applies when the mapping from feedback to reward is known and leads to a computationally efficient algorithm with near-optimal regret. We show that this algorithm outperforms state-of-the-art approaches on real-world learning-to-rank datasets, demonstrating the advantage of oracle-based algorithms. Our second reduction applies to the previously unstudied setting when the linear mapping from feedback to reward is unknown. Our regret guarantees are superior to prior techniques that ignore the feedback.

artificial intelligence, inductive learning, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States > Massachusetts (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)

Add feedback

Contextual Semibandits via Supervised Learning Oracles

Krishnamurthy, Akshay, Agarwal, Alekh, Dudik, Miroslav

arXiv.org Machine LearningNov-4-2016

Decision making with partial feedback, motivated by applications including personalized medicine [22] and content recommendation [17], is receiving increasing attention from the machine learning community. These problems are formally modeled as learning from bandit feedback, where a learner repeatedly takes an action and observes a reward for the action, with the goal of maximizing reward. While bandit learning captures many problems of interest, several applications have additional structure: the action is combinatorial in nature and more detailed feedback is provided. For example, in internet applications, we often recommend sets of items and record information about the user's interaction with each individual item (e.g., click). This additional feedback is unhelpful unless it relates to the overall reward (e.g., number of clicks), and, as in previous work, we assume a linear relationship. This interaction is known as the semibandit feedback model. Typical bandit and semibandit algorithms achieve reward that is competitive with the single best fixed action, i.e., the best medical treatment or the most popular news article for everyone. This is often inadequate for recommendation applications: while the most popular articles may get some clicks, personalizing content to the users is much more effective.

artificial intelligence, inductive learning, machine learning, (20 more...)

arXiv.org Machine Learning

1502.0589

Country: North America > United States > Massachusetts (0.27)

Genre: Research Report > New Finding (0.67)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

Temporally Expressive Planning Based on Answer Set Programming with Constraints

Bao, Forrest Sheng (Texas Tech University) | Zhang, Yuanlin (Texas Tech University)

AAAI ConferencesJul-21-2012

Recently, a new language AC(C) was proposed to integrate answer set programming (ASP) and constraint logic programming (CLP). In this paper, we show that temporally expressive planning problems in PDDL2.1 can be translated into AC(C) and solved using AC(C) solvers. Compared with existing approaches, the new approach puts less restrictions on the planning problems and is easy to extend with new features like PDDL axioms. It can also leverage the inference engine for AC(C) which has the potential to exploit the best reasoning mechanisms developed in the ASP, SAT and CP communities.

pddl2, planning problem, simple action, (13 more...)

AAAI Conferences

Twenty-Sixth AAAI Conference on Artificial Intelligence

Country: